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ABSTRACT 

Classical multiple regression was compared with 
Bayesian m-group regression, complete with cross-validation. The 
setting was a post-developmental studies situation in a comprehensive 
community college. A secondary purpose of the study was to 
incorporate an advisor prediction of grade point average (GPA) as 
input into both regression procedures. The reliability and predictive 
. walidity of the advisor predictions were both investigated. One major 
strength of the study was the inclusion of variables measuring 
progress during developmental studies. Predictions based solely on 
data available prior to developmental studies would invariably 
predict failure because it is those variables which suggested a need 
for developmentai studies in the first place. A second major’ strength 
of this study was the wclusion of an advisor prediction as a 
variable in both regression methods. This inclusion maintained 
comparability between methods while allowing the inclusion of both 
"hard" and “soft” data. The criterion variable in the study was 
first-quarter GPA in the student*’s chosen curriculum, after the 
student haa completed developmental studies. (Author/CTM) 
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INTROOUCTION 


Developmental studies programs have been one of the major identifying 
features of the dommencity colleges in the United States in the twentieth 
% century. These Programs were deemed necessary by the general availability 
of the "open-door" policy on admissions at these “people's colleges." 
Since many of the students who were attracted to these new colleges were 
uidercprepared by traditional academic standards, the community colleges 
undertook an ever-increasing role in remediating the deficiencies of 
these non-traditional students. "It was in the community college that 
postsecondary efforts at remedial education became widespread two decades 
ago" (Roueche and Snow, 1977, p. 4) Although many titles were assigned 
‘to the new programs, the term "developmental studies" seems to have become 
somewhat standard. | 

Since the open door policy precluded any major emphasis on admissions, 
the guidance of prospective students into suitable curricula has taken 
on extreme importance. As Novick (1970, p. 1) has said, "Decisions of 
consequence for each students center largely around the choice of program 
of study." Thus, in addition to developmental studies, the emphasis on 
guidance has been another identifying feature of the community colleges. 
Many counselors in the-community colleges have been concerned that non- 
traditional students would be unsuccessful in many curricuja. In fact, 
several books and articles in the early 1970's expressed the fear that 


the open door has become simply a revolving door. (Moore, 1970, 1971, 


@? 


if 
1976; Roueche and Kirk, 1974; Roueche and Snow, 1977). 


x a 
To prevent the open door from becoming a revolving door, especially 


for developmental students, the community college must give major atten- : 


J 
tion to the counseling of students during and after completion of develop- 


mental studies. (Bushnell, 1973, pp. 108-114) Often the student never 
sees a counselay during the course of his studies. AS a result, the 
professional help which could have been offered is never made available. 
In many cases it jis left to the classroom instructor to assist the 
developmental students in their choice of curriculum. Thus, the non- 
traditional student, perhaps with some faculty assistance,.has had to 
apply whatever common-sense or -rumor-mill data he could locate-in his 
search for a suitable curriculum in which he might. succeed. 

Even if counseling were available for developmental students, the 
problem of curriculum selection has been viewed primarily as a selection 
problem for admissions officers at universities rather than as a guidance 
problem for counselors and students at community colleges. Given the 
difficulties of curriculum selection after developmental studies at the 
community colleges, it seems that admissions procedures could and should 
be applied in this parallel situation. (Novick and Jackson,:1974a, p. 81) 
Henriksen (1973) applied such procedures to various curricula at a 
single institution. His spaqiee was that the results of "admissions" 
procedures should be for the benefit of the student making the selection 
of major field, rather than the admissions officer comparing potential 


students for selective admissions. 


Present Status of the Problen | 

Historical ty; the alimissions problem has centered on aradieving 
grade point average (GPA) for the first term or first year of college 
study using multiple regression procedures. The predictor variables have 
almost always included brancschont grades and the scores obtained on 
standardized tests. Other measures of the student have also been used, 
with differing degrees of success. Some of the more unusual predictor 
variables in recent studies}were marital status of family, position in 
family, and number of siblings (Chase and Johnson, 1977); geographic 
area (Adams, et al, 1976); and parochial or non-parochial school . 

(Astin, 1971). 

The purpose of the more recent studies has been to aid the .potential 
student in his selection of curriculum within a college as well as in his” 
selection of college. This version of the ctassic multiple regression 
admissions procedure has the advantage of giving the student the results 
of the analysis so that he may then input those results into his personal 
decision-making process. For the new students in developmental studies 
in the community colleges, however, the traditional analyses would 
invariably forecast failure because the predictor variables used in the 
analysis are the very same variables which relegated the student to ) 
deve opmental studies in the first place. (Noore 1970, p. 7) Consequently, 
it seems reasonable that the prediction should not be prepared until all 
the relevant data are available. Data measuring progress ‘in developmental 


studies and the resulting higher capabilities of the student should be 


included. Given the premise that developmental studies can help at least 
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Some of the non-traditional students, then it would be premature to 
-predict curriculum success without taking the developmental studies 
process into account. : 
Problem Statement 

Two methods have traditionally been used in selecting an appropriate 
curriculum and/or predicting success in that curriculum: classical 
statistical prediction and informal’ counselor predfction. Classical 
statistical prediction uses mottipte regression to develop a prediction 
eqwtion which is applied in exactly the same way to each student's data, 
while informal counselor prediction allows a counselor to incorporate 
‘beliefs and impressions along with data in predicting separately for 
each student. The classical statistical prediction methods which utilize | 
‘multiple regression have three basic weaknesses for the kind of applica- ( 
tion under consideration. First there is concern for the special individ- 
uals against whom the prediction is biased. As Novick and Jackson (1970, 
p. 461) pate out: 

When deGiety saants: the formal classification model, 

it is satisfied because assignments, on the average, are 

then good. The student, however, is unconcerned with such 

average good. If he perceives phat he belongs to some 

subgroup for which, on the average, .poor assignment 


decisions are made, it will not comfort him to know that 
the system works well for almost everybody else. 


Second, sample size would seldom be large enough to ensure validity 
in classical statistical prediction models. Kerlinger and Pedhazur (1973, 
p. 442) proposed that over 100, and preferably over 200, are needed to . 


protect the validity of predictions. Samples of this size are usually 


impossible to all but the very largest community colleges. Third, the- 
users of a classical statistical prediction model would often not under- 
stand the model well enough to apply it properly. Counselors, advisors, 
and especially the students themselves/ are seldom adept at interpreting 
the output of statistical models. 

The informal counselor prediction model allows, and even encourages, 
consideration of unique characteristics of individuals whose statistical 
description is not an accurate picture of potential. This model also 
has no requirement for minimum sample size and no background requirement 
for proper interpretation. However, the counselor predictions tend to 
lack reliability and cannot be adequately documented. As Houston (1976) 
points out, counselor prediction models frequently fail to sess consist- 
ent, predictable, and dependatite results and cannot always identify what 
has really been measured. 

A hybrid model that would wtitize both counselor input and statistical 
analysis would be more appropriate than either of the two pure aeddes 
(Houston, 1976, p. 6) Such a model is available in Bayesian m-group regres- 
sion, which was developed by M. R. Novich and his associates based on a 
mathematical framework developed by D. V. Lindley. Bayesian m-group regres- 
sion uses an xopitention oF Bayes' Theorem to separate the statistical analysis 
by groups to the extent allowed by the data in each group. (Novick and 
Jackson, 1974a, p. 79) This is equivalent to an informal counselor input 
which recognized difference between curricula. However, Houston (1976, 

p. 103) recommended that the Bayesian model be extended "with the inputs 


of certain counselors’ evaluations as independent variables." 


Purpose of the Study - } 

The purpose of the study was to compare classical multiple regres- 
sion with*Bayesian m-group regression, complete with cross-validation of 
both methods. Novick and Jackson (1974a, p. 77), Houston (1976, p. 104), 
Hinkle and Houston (1977), Henriksen (1973, p. 63), and Kerlinger and 
Pedhazur (1973, pp. 282-284) have stressed a need for additional cross- 
validation studies in order to check shrinkage of the multiple correla- 
tion obtained in regression applications. The context of the study was 
to predict first-quarter GPA for postudevetopmental students in various 
curricula in a comprehensive community college. ‘In addition to high school 
data and standardized test scores, each student would also have data 
representing his level“of success in Jeveloonental studies. 

A secondary purpose of the study was to incorporate a counselor 
prediction of each student's GPA. The inclusion of these input data ~ q 
into both the classical multiple regression and the Bayesian m-group 
regression was specifically suggested by Houston (1976, p.-103) and 
Hinkle and Houston (1977) and also meets the general suggestion of 
Novick and Jackson (1970, p. 89). In the present study, a counselor 
prediction was incorporated into both predigtion nethods and the 
reliability and spedtetive validity of the counselor predictions were 


investigated. 


Subjects 


Two groups of subjects were needed for this study. The first group, 


from which the prediction equations. were developed, were those students 


who completed developmental studies and then finished at least one- 
quarter of their chosen curriculum at a comprehensive community college 
between Fall, 1974, and Spring, 1978. This group was called the 
‘screening group. The second group, upon which the prediction equations 
were applied, were those post-developmental students who completed one 
quarter of their chosen curriculum at the community college during 
Summer or Fall, 1978. This group was called the calibration group. 


These two groups were necessary for cross-validation. 


Regression Analyses . 

Bayesian m-group regression was performed with a FORTRAN program 
developed by Shigemasu (1976) entitled “Bayesian M-Group Regression Analysis 
with Identical Beta." An assumption of equal slopes across m-groups was - 
incorporated by Shigemasu (1976) as.a simplification of Bayesian m-group | 
regression. This assumption says that the regression coefficients of 
each predictor variable in the regression equations are independent of 
. groups. This means that the impact of any variable on GPA (for example) 
is the same, or very nearly the same, in each group. The only regression 
parameter allowed to change across groups is the intercept, the regression , 
constant. Shigemasu states his belief that "this equal-slope model js 
a realistic, reasonable specification for many applications in academic 
prediction." (1976, p. 158) The two primary advantages to this equal- 
slope assumption are that all data from all groups may be used in estima- 
ting the slopes (likely increasing the precsision of estimation) and that 


computation time is significantly reduced (likely increasing the 
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availability of Bayesian methods). (Shigemasu, 1976, p. 158) 

Classical multiple regression analysis was performed with the 
REGRESSION subprogram of the Statistical Package for the Social Sciences 
(for 087360, Version M, Release 8.0, January, 1979) (Nie et al, 1975). 
The final regression model was determined by using two backward deletion 
methods (Hinkle, 1979, p. 405). The first of these methods was to | 
‘compare the restricted regression model following the systematic deletion 


of a variable to the original full model. The second method involved 


comparing the restricted model to the full model of the previous step. - 
-. The rationale far using the two methods js that "it would be possible 
to delete variables that singly do not make a significant difference, 
but collectively account for a significant portion of the -variance" 
(Hinkle, 1979, p. 405). | 
RESULTS. 
The final regression mode] included the following predictor variables: . % 


1. Sex of the respondent (SEX) 
2. Curriculum change during develomentas studies (Yes or No) 
( CHANGE ) 
3. Comparative Guidance and Placement (CGP) Test 
a. Reading (READ) 
b. Sentences (SENT) 
c. Mathematics’ (MATH) 
4. Counselor Prediction (PRED) 


Cross-Validation 
After completion of classical multiple regression and Bayesian 


regression analyses on the screening group (N = 399), the two prediction - 


equations were applied to the calibration group (N = 45). The subjects 


assigned to the calibration group were those students who completed one 
quarter pastcDevetopmental during Summer or Fall, 1978. Using the last 
group of subjects as the calibration group mirrors the real-life applica- 
tion of GPA prediction studies, i.e., last year's students’ scores 
generate a prediction model which is applied to this year's students. 
Since the classical regressi¥n equation is a least-squares best fit 
on screening data, its application to calibration cases ts expected tg 
show a decreased R. In this study, multiply p® shrank from 0.281 on 
screening to 0.271 on calibration. On the other hand, since the Bayesian 
regression equation adjusts coefficients and intercepts toward the grand 
mean values, Bayesian Re values can be expected to exhibit more stability 
when applied to calibration cases. In this study, Bayesian Re actually , 


increased from 0.275 on screening to 0.279 on calibration. 


Comparing Multiple Regression with Bayesian Regression 

With the assumption of equal slopes across m-groups but different 
intercepts from each group, classical multiple regression analysis was 
performed on screening group data. The criterion variable, first 
quarter curriculum GPA, was regressed on the total set of predictor 
variables, which included six of the original predictor variables and | 
the five dummy variables. Regression coefficients (b) and beta-coeffi- 
cents for the least-squares hyperplane are given in Table le an "addition 
to b and beta-coefficients, Table 1 also reports an Rr? value. of 0.281 
for classical multiple regression with equal eicnés, 


With the FORTRAN program developed by Shigemasu, taking the 


mulitple regression slopes and intercepts as initial values, a set of 
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Bayesian slopes and intercepts was developed for aati group 
‘data. Coefficients for the Bayes{an rearesston hyperplane are also 
_ given in Table 1. Also given is a of 0.275 for Bayesian m-group regres- 
sion with equal slopes. , | 
ot Coefficients for predictor variables are quite similar in both 

regression models. However, for each dumny ‘variable, the Bayesian 
regression coefficient is closer to zero than the multiple regression 
coefficient. It should be noted that these coefficients for the dummy. 
variables display the Bayesian assumption: intercepts for m-groups should 
be modified in the direction of the grand mean intercept. According 
‘to Lindley and Smith (1972, p. 16), “they "tend to be ‘shrunk’ towards a 
common value." eM Fy 

To compare the results of multiple regression with the results of 
Bayesian regression either in the screening group or in the calibration 
group, a dependent t-test for correlated samples was used (Ferguson, 
1976, p. 185). These data indicate (see Table 2) that Re values for the 
screening group are not significantly different. With a Classical R2 
of 0.28] and a Bayesian R2 of 0.275, the t-value is 0.92, with an 
associated srobabritty of 0.356. Similarly, for the calibration group, 
the classical R2 of 0.271 and the Bayesian R2 of 0.279 are not signifi- 
cantly different (t = 0.23, p = 0.816). 


e. , 
Mean Errors; Mean Absolute Errors; Mean Squared Errors 


In addition to the test of Rr to compare Bayesian m-group regression~/ 


with classical multiple regression, tests were performed on mean-error- 
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loss, absolute-error-loss, and squared-error-loss (Novick and Jackson, 
1974b). Each of these tests begins by caleutating the error of prediction . 
for each student, “ predicted GPA minus actual GPA. Mean-error-loss . . 
“simply computes the ean of these errors across all students." -Absolute- 
error-loss computes the mean of the absolute values of the errors, and 
-$quared-error-loss computes the mean of ‘he squares. of the errors. All 
statistical comparisons were performed with dependent t-tests (Ferguson, 
1976, page 180). It is most interesting to note # Table 2 that none 
of the comparisons showed any statistically significant difference between 
classical miltiple. regression and Bayesian m-group regression. 
> 

Predictive Validity of Advisor Predictions — 

Two methods were used to determine the presiccive validity of adviser” 
“predictions in this etidy.’ In the first method, the Bioeng moment 


corretation ‘between committee prediction (PRED) and actual ‘SPA, for all 


:* subjects in both screening and eaktnpaviok groups (N = 444), was found 


to be 0.457. The second method involved the tnclusion of advisor pre- 
dictions to determine whether ‘the magnitude of ‘the multiple correiation 
doefficient would increase in either regression method. In cjassical 
multiple regression, the final model resulted in an R2 of 0.281. The 
same model with PRED deleted had an R° of 0.258. ‘This statistically 
significant difference in Re demonstrated the predictive validity of 
advisor predictions. In addition, “PRED had the largest standardized 
regression: coefficient. (beta), a measure of relative level of contribu- 
tion, of any of the predictor variables, in the study. Thus, net only 
; did the inclusion of PRED increase R2, but also PRED was, the largest 


contributor to the final caer model. ® 
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Limitations i the Study 

There were several limitations in this study. The first, and 
probably most eeiticaly, was the use of Shigemasu's equal-slope assump- 
tion. Use of the serie Bayesian regression methad without this assump- 
tion may have produced different results, because it would have allowed 
for different slopes for the-same variahle in different m-groups. If the 
true" felationshis between a given variable and GPA changes across the 
m-groups, then this equal-slope assumption would be tenuous, and the 
predictive validity of the regression equation would be automatically 
lower than it would have been using a general Bayesian m-group regression. 
Nevertheless, Shigemasu (1976, p. 158) justified his assumption and suc- 
cessfully tested it. ‘In addition, the size of the groups in the present 
study was insufficient for a general Bayesian m-group regression. 

Sample size was a second limitation in the present study, in that _ 
there were 399 students in the scveanini group and 45 students in the 
calibration group. Since the most liberal requirement (Gorsuch, 1974, 
—?=. 296) calls for five times as many students as variables, there was ' 
no possible justification for retaining all 17 predictor variables. 

* Coaeeauently, the predicative validity of the final regression models in 
the present. study was limited by the number of permissable Panes 

“A third limitation was the method of. grouping curricula. Six’ ‘m-groups 
- were. established in the present study based on similarities in curricufum 
courses and on the level of mathematics required ‘fn developmental studies 
prior to curriculum entry. Although such a grouping can be logically 


defended, there are still some differences within groups which could 


a . 
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threaten validity of GPA predictions. 

. A fourth limitation in this study was the use of "blind" data for 
the counselor prediction. Even though their predictions proved to be 
quite reliable in predicting first-quarter GPA, questions stiil remain 
about how much better, or worse, the counselor predictions could have 


been following face-to-face contact with each student. 
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Strengths of This Study 


One major strength of this study was the inclusion of variables 
measuring progress during developmental studies. Predictions based solely 
on data avai lage prior to developmental studies would invariably predict 
failure because it is those. variables which Suggested a need for develop- 
mental stains in the first place. (Moore, 1970, p. 7) 

A second major strength of this study was the inclusion of an counselor. 
prediction as a variable in both regression methods. This inclusion 
maintained comparability between methods while allowing the inclusion 
of both "hard" and "soft" data. This is the sort of compromise needed 
between classical statistical models and counseling models. (Houston, | 


1976, p.. 6) 
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TABLE 1 


Coefficients of Regression Equations 


Classical Bayesian 
variable b beta b beta 
PRED 0.350 0.208 0.369 0.215 
MATH 0.018 . 0.135 0.018 ' 0.135 
READ 0.012 0.121 9.012 0.128 
SENT 0.012 0.112 0.012 0.108 
CHANGE -0.243  -0.106 | . 0.250. -0.110 = 
SEX 7 0.148 0.068 0.163 0.075 
DUMA 0.006 -0.002 -0.003  -0.001 
DUMB 0.077 0.026 0.022 0.007 
DUMC — 0.010 = -0.003 -0.002 -0.001 
puMD 0.329 -0.077 -0.044 -0.010 
~ DUME 0.114 0.048 0.041 0.017 
constant —-0.631 0 --0.678 0 
R 0.281 | 0.275 
: { 
lo 


TABLE 2 


CORRELATION COEFFICIENTS 
MEAN. ERRORS 

MEAN ABSOLUTE ERRORS 

“ MEAN SQUARED ERRORS 


‘ Classical Bayesian t p 
0.281 Rr 9.275 0.92 0.356 
Screening - 
‘Group 0.000 © mean error 0.000 0.09 0.93) 
N = 399 0.725 mean ab. error -0.729° 0.84 0.403 
: ( , 0.836 - mean sq. error 0.844 0.80 0.425 
0.271 R2 0.279 0.23 0.816 
Calibration 
Group -0.014 mean error -0.048 1.78 0.083 
N= 45 0.734 mean ab. error 0.714 1.05 0.299 
0.919 Mean sq. error 0.908 . 0.19. 0.848 
) 
Be 
1% 
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